1. Intro:

As a first step in the analysis of treatment effects, we will look at differential expression according to LIMMA. The first step in this will be to learn usable representations of the batch effects in the data with SVA. The resulting vectors can be fed straight to LIMMA to correct for confounders. First we will take the approach of keeping treatments sepparate, but once we an understanding of similarity between treatments, we will regroup the treatments into contrasts and look for common effects.

2. Data Used:

Meta data for this report are available in:

Uncorrected imputed data can be found in:

3. Surrogate Variable Analysis

The SVA algorithm is set up to find effects in the data which cannot be explained by the treatments. This will allow us to go beyond just defined batches and find effects which may be within batch as well. Once we have these variables, we will include them directly in our design matrix.

## Number of significant surrogate variables is:  4 
## Iteration (out of 5 ):1  2  3  4  5

3.1 Change in surrogate variables across batches

From this graph, it is apparent that all 4 SVs are picking up major batch effecs in the data. In addition, there is significant spread of the variables within batch, which we believe implies that SVA is gathering information that is even more fine grained.

3.2 Known batch effect capture with surrogate variables

Given that SVA works a lot like a PCA, we can see it picking up similar batch sepparation. This doesn’t give more information than the previous plot besides a different way of visualization.

3.3 Correspondence of surrogate variables with run order

Sorting the runs by run order clearly gives a demonstration of the additional effects SVA is pulling out. Much of this is likely due to the amount of imputation that is occuring per file, but that is fine. The main thing is that we did not have to specify these effects, but they are identified anyways.

4. Differential phosphorylation with limma

The standard in our lab for differential expression is LIMMA. While we tend to use it for standard differential expression against some control, the most useful part about it will be the ability to test custom contrasts. Many of the treatments we used are directly related to each other and can be grouped together. This may reveal common effects that are different from other groupings and increase power. To start, we will look at standard differential expression and then determine what it can say about treatment similarity.

4.1 Explained variance

## # A tibble: 1 × 1
##   `median(improvement)`
##                   <dbl>
## 1                  25.0

## # A tibble: 3 × 2
##   label      `median(variance)`
##   <fct>                   <dbl>
## 1 Treatments              0.243
## 2 SVs                     0.389
## 3 Full Model              0.614

Most of the variance in the data is explained away by the surrogate variables, which is good. There is defenitely left over effect from the treatments and that is what we will be interested in next.

4.2 Distribution of P-Values at treatment level

The P-value distribution looks good so results have been writen to:

  • output/limma_diffential_expression_results.csv

4.3 Determination of significant hits

This volcano plot seems a little off still. There are large effects which have very low q-values, which makes the volcano plot look very flat. We will carry on for now and play around with the testing later to determine if more can be done.

Full data counts:

## # A tibble: 1 × 3
##   nTreatmentSitePairs nSites nProteins
##                 <int>  <int>     <int>
## 1              529400   5294      1390

Regulated counts:

## # A tibble: 1 × 3
##   nTreatmentSitePairs nSites nProteins
##                 <int>  <int>     <int>
## 1               25182   4477      1281

Regulated 5 min counts:

## # A tibble: 1 × 3
##   nTreatmentSitePairs nSites nProteins
##                 <int>  <int>     <int>
## 1               21954   4039      1202

4.4 Differentially regulated sites per treatment

## Warning: Removed 1 rows containing missing values (position_stack).
## Removed 1 rows containing missing values (position_stack).
## Removed 1 rows containing missing values (position_stack).

As we have seen in other analyses the SP condition definitely sticks out like a sore thumb.

## Warning: Removed 1 rows containing missing values (geom_point).

Looking at the above plot it is clear that almost all the regulatory effect is down regulation. There is definitely some up-regulation present but it makes up a minority of the effect comparatively.

There was some worry that the over abundance of downregulation could be an artifact of imputation. However, the over-representation of down regulation vs up regulation is anti-correlated with imputation, which seems to imply the that this a real effect of highly expressed sites.

4.5 Amount of regulation by site

## Warning: Removed 15 rows containing missing values (geom_smooth).
## Removed 15 rows containing missing values (geom_smooth).

## Warning: Removed 24 rows containing missing values (geom_smooth).
## Removed 24 rows containing missing values (geom_smooth).

## Warning: Removed 59 rows containing missing values (geom_smooth).
## Removed 59 rows containing missing values (geom_smooth).

The above plots show that there is a break point in the amount of treatments where sites are regulated. This seems to imply that some sites are more prone to be regulated in many treatments, possibly because of their importance.

4.6 Similarity of effects between treatments

The above plots give a great view into the overlap between conditions. In order to produce them, we just counted the number of overlapping significant coefficients between conditions. One thing that is apparent is that a large portion of treatments don’t have much going on. There is a good chance that the discovered coefficients may be false positives, but further investigation is warranted to determine whether there are any robust but unique effects. The other large portion of treatments have high overlap between each other. There are at least two apparent clusters, with distinct patterns of connection between them. In the next section, I will try to use certain subsets of the data to tease apart effects.

5. Investigation of treatment group similarities and differences

5.1 Osmostressors

5.2 PH Stress

Carbon stress